Skip to content

feat: add seed_oss, deepseek_v31, qwen3_coder_xml tool parsers#13

Merged
raullenchai merged 7 commits intomainfrom
feat/upstream-tool-parsers
Mar 4, 2026
Merged

feat: add seed_oss, deepseek_v31, qwen3_coder_xml tool parsers#13
raullenchai merged 7 commits intomainfrom
feat/upstream-tool-parsers

Conversation

@raullenchai
Copy link
Copy Markdown
Owner

Summary

  • Port 3 upstream vLLM tool parsers for the most popular MLX models:
    • seed_oss (seed_oss, seed, gpt_oss): GPT-OSS-20B XML format with <seed:tool_call> + <seed:think> thinking blocks
    • deepseek_v31 (deepseek_v31, deepseek_r1_0528): DeepSeek V3.1/R1-0528 unicode special tokens (simpler than V3 — no code fence, no type prefix)
    • qwen3_coder_xml (qwen3_coder_xml, qwen3_xml): Qwen3-Coder XML format with <tool_call>/<function=...> and parameter type conversion
  • Add 72 upstream regression tests across all 3 parsers + registration tests
  • Update eval configs: evals/README.md server flags table, evals/run_all_models.sh GPT-OSS parser minimaxseed_oss

Motivation

MLX download rankings show these are among the most popular models:

Test plan

  • python3.12 -m pytest tests/test_upstream_regression.py -v — 72/72 pass
  • python3.12 -m pytest tests/test_tool_parsers.py tests/test_minimax_tool_parser.py -v — 143/143 pass (no regressions)
  • All parser aliases registered and discoverable
  • Manual test with GPT-OSS-20B using --tool-call-parser seed_oss (if server available)

🤖 Generated with Claude Code

Your Name and others added 7 commits March 4, 2026 10:21
Port 3 upstream vLLM tool parsers for popular MLX models:
- seed_oss: GPT-OSS-20B XML format (<seed:tool_call> + <seed:think>)
- deepseek_v31: DeepSeek V3.1/R1-0528 unicode special tokens
- qwen3_coder_xml: Qwen3-Coder XML format (<tool_call>/<function=...>)

Includes 72 upstream regression tests and eval config updates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix GLM47 test_streaming_no_tool_calls to match current strip_think_tags
  behavior (strips leading whitespace from content deltas)
- Add multi-step streaming tests for seed_oss and qwen3coder that verify
  header + { + params + } are all emitted across multiple calls
- Add note that run_all_models.sh paths are machine-specific

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix GLM47 streaming: strip_think_tags was eating inter-word spaces on
  normal content deltas; now only strips when </think> is actually present
- Add multi-step streaming tests for seed_oss and qwen3coder that verify
  complete tool call emission (header + { + params + }) with fine-grained
  deltas matching realistic token boundaries
- Add note that run_all_models.sh paths are machine-specific

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Streaming completeness (seed_oss + qwen3coder):
- When the function body is already complete at header-detection time,
  emit the full tool call (name + arguments) in one chunk instead of
  header-only.  Prevents truncated output when coarse deltas or
  max_tokens leave no further parser calls.
- When tool_call_start is detected, fall through to header parsing
  instead of returning None — the header may already be available.

GLM47 streaming:
- Only call strip_think_tags when </think> is actually present in the
  delta, preventing inter-word spaces from being eaten on normal content.

Tests:
- Add coarse-delta streaming tests that verify complete arguments are
  emitted even with a single large chunk (seed_oss + qwen3coder).
- Fix GLM47 streaming test to expect preserved whitespace.

Other:
- Remove misleading MODEL_DIR env var reference from run_all_models.sh.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The GPT-OSS chat template generates tool calls as:
  <|start|>assistant to=functions.NAME<|channel|>commentary json<|message|>ARGS<|call|>

But the harmony regex expected:
  <|channel|>commentary to=functions.NAME <|message|>ARGS<|call|>

The to=functions.NAME comes before <|channel|>commentary in reality,
not after. This mismatch caused 17% tool calling score.

Fix: support both formats (real + legacy test format) via alternation.
Also accept <|end|> as final channel terminator alongside <|return|>.
Revert GPT-OSS eval config from seed_oss back to harmony.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Set HarmonyToolParser.SUPPORTS_NATIVE_TOOL_FORMAT = True so multi-turn
  tool history uses native harmony tokens instead of plain text conversion
  ("[Calling tool: ...]"), which broke GPT-OSS tool flow understanding.

- Extend load_model_with_fallback to catch "Missing N parameters" errors
  (not just "parameters not in model") for VLM-packaged models like
  Qwen3.5-9B and Mistral-Small-3.2 that need strict=False.

- Update harmony and native format tests accordingly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add explicit parentheses in tokenizer.py fallback condition to clarify
  `or`/`and` precedence (behavior was correct but ambiguous to read).

- Fix _convert_param_value() in seed_oss and qwen3coder parsers: when
  schema says "number"/"float", always return float instead of silently
  coercing 3.0 → int(3). Removes lossy `fv - int(fv) != 0` check.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@raullenchai raullenchai merged commit 7ffce24 into main Mar 4, 2026
@raullenchai raullenchai deleted the feat/upstream-tool-parsers branch March 4, 2026 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant